reward decoder
An Information Theoretic Approach to Interaction-Grounded Learning
Hu, Xiaoyan, Farnia, Farzan, Leung, Ho-fung
Reinforcement learning (RL) problems where the learner attempts to infer an unobserved reward from some feedback variables have been studied in several recent papers. The setting of Interaction-Grounded Learning (IGL) is an example of such feedback-based RL tasks where the learner optimizes the return by inferring latent binary rewards from the interaction with the environment. In the IGL setting, a relevant assumption used in the RL literature is that the feedback variable $Y$ is conditionally independent of the context-action $(X,A)$ given the latent reward $R$. In this work, we propose Variational Information-based IGL (VI-IGL) as an information-theoretic method to enforce the conditional independence assumption in the IGL-based RL problem. The VI-IGL framework learns a reward decoder using an information-based objective based on the conditional mutual information (MI) between $(X,A)$ and $Y$. To estimate and optimize the information-based terms for the continuous random variables in the RL problem, VI-IGL leverages the variational representation of mutual information to obtain a min-max optimization problem. Also, we extend the VI-IGL framework to general $f$-Information measures leading to the generalized $f$-VI-IGL framework for the IGL-based RL problems. We present numerical results on several reinforcement learning settings indicating an improved performance compared to the existing IGL-based RL algorithm.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > China > Hong Kong (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (3 more...)
Interaction-Grounded Learning
Xie, Tengyang, Langford, John, Mineiro, Paul, Momennejad, Ida
Consider a prosthetic arm, learning to adapt to its user's control signals. We propose Interaction-Grounded Learning for this novel setting, in which a learner's goal is to interact with the environment with no grounding or explicit reward to optimize its policies. Such a problem evades common RL solutions which require an explicit reward. The learning agent observes a multidimensional context vector, takes an action, and then observes a multidimensional feedback vector. This multidimensional feedback vector has no explicit reward information. In order to succeed, the algorithm must learn how to evaluate the feedback vector to discover a latent reward signal, with which it can ground its policies without supervision. We show that in an Interaction-Grounded Learning setting, with certain natural assumptions, a learner can discover the latent reward and ground its policy for successful interaction. We provide theoretical guarantees and a proof-of-concept empirical evaluation to demonstrate the effectiveness of our proposed approach.
- North America > United States > Illinois (0.04)
- North America > United States > New York (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (3 more...)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Therapeutic Area > Neurology (0.68)